Greedy Algorithm

Greedy algorithm for NP-hard problems has performance guarantee under certain conditions. It achieves optimum for many other problems.

NP-Hard Problem

Set Cover Problem: Find a subset $S \subset X$ with the highest score, $ f(S) $, where: $$ f: 2^{X} \to \mathbb{R} $$ Do we have to check all subsets to find the best one?

Theorem

The greedy algorithm starts with $ S = \emptyset $. It picks one element $ x $ at each step, where:

$$ \begin{aligned} x = \arg \max_{x\in X \backslash S} \left[ f(S \cup \{x\}) - f(S) \right] \end{aligned} $$

If $ f $ subjects to submodularity and monotonicity, then the algorithm guarantees $ (1 - \frac{1}{e}) $ of the optimum.

Proof

Suppose $S_i$ is the set chosen by greedy algorithm at step $i$.

Suppose $S^{*} = \{x_1^{*}, \ldots, x_k^{*} \}$ is the optimal solution of size $k$.

Then:

$$ \begin{aligned} f(S^{*}) &\leq f(S^{*} \cup S_i) \quad \text{[monotonicity]} \\ &= f(\{x_1^{*}, \ldots, x_{k}^{*} \} \cup S_i) \\ &= f(\{x_1^{*}, \ldots, x_{k}^{*} \} \cup S_i) - f(\{x_1^{*}, \ldots, x_{k - 1}^{*} \} \cup S_i) + f(\{x_1^{*}, \ldots, x_{k - 1}^{*} \} \cup S_i) \\ &= \cdots \\ &= \sum_{j = 0}^{k} \left[ f(\{x_1^{*}, \ldots, x_{j}^{*} \} \cup S_i) - f(\{x_1^{*}, \ldots, x_{j - 1}^{*} \} \cup S_i) \right] + f(S_i) \\ &\leq \sum_{x^{*} \in S{^*}} \left[ f(\{x^{*}\} \cup S_i) - f(S_i) \right] + f(S_i) \quad \text{[submodularity]} \\ &\leq k \cdot \left[ f(S_{i+1}) - f(S_i) \right] + f(S_i) \quad \text{[greedy choice at step $i$]} \\ \end{aligned} $$

Accordingly: ($ k $: target size; $ l $: steps taken)

$$ \begin{aligned} f(S^{*}) - f(S_i) &\leq k \cdot \left[ f(S_{i+1}) - f(S_i) \right] \\ &\Updownarrow \\ \alpha_i &\leq k \cdot \left[ \alpha_{i} - \alpha_{i+1} \right] \\ \alpha_{i+1} &\leq (1 - \frac{1}{k}) \alpha_i \\ \alpha_{l} &\leq (1 - \frac{1}{k})^{l} \cdot \alpha_0 \\ &\leq (1 - \frac{1}{k})^{l} \cdot f(S^{*}) \\ &\Updownarrow \\ f(S^{*}) - f(S_l) &\leq (1 - \frac{1}{k})^{l} \cdot f(S^{*}) \\ &\leq e^{-\frac{l}{k}} \cdot f(S^{*}) \\ f(S_l) &\geq (1- e^{-\frac{l}{k}}) \cdot f(S^{*}) \end{aligned} $$

Result Analysis:

When $l = k$, greedy algorithm gives 63% approximation guarantee:

$$ \begin{aligned} f(S_k) \geq (1 - e^{-1}) \cdot f(S^{*}) \end{aligned} $$

When $l = 2k$, greedy algorithm gives 86% approximation guarantee:

$$ \begin{aligned} f(S_{2k}) &\geq (1 - e^{-2}) \cdot f(S^{*}) \end{aligned} $$

This means we can use more steps to assure better approximation.

Reference

When Greedy Algorithms are Good Enough